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Abstract 

The multidrug-resistant5freptococcL/spneL//T7on/ae Taiwan^ ^""-1 4, or PMEN1 4, clone was first observed with a 1 9F serotype, which is 
targeted by the heptavalent polysaccharide conjugate vaccine (PCV7). However, "vaccine escape" PMEN14 isolates with a 19A 
serotype becanne an increasingly innportant cause of disease post-PCV7. Whole genonne sequencing was used to characterize the 
recent evolution of 1 73 pneunnococci of, or related to, PMEN 1 4. This suggested that PMEN 1 4 is a single lineage that originated in the 
late 1 980s in parallel with the acquisition of multiple resistances by close relatives. One of the four detected serotype switches to 1 9A 
generated representatives of the sequence type (ST) 320 isolates that have been highly successful post-PCV7. A second produced an 
ST236 1 9A genotype with reduced resistance to p-lactams owing to alteration of pbp1a and pbp2x sequences through the same 
recombination that caused the change in serotype. A third, which generated a mosaic capsule biosynthesis locus, resulted in serotype 
1 9A ST27 1 isolates. The rapid diversification through homologous recombination seen in the global collection was similarly observed 
in the absence of vaccination in a set of isolates from the Maela refugee camp in Thailand, a collection that also allowed variation to be 
observed within carriage through longitudinal sampling. This suggests that some pneumococcal genotypes generate a pool of 
Standing variation that is sufficiently extensive to result in "soft" selective sweeps: The emergence of multiple mutants in parallel 
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upon a change in selection pressure, such as vaccine introduction. The subsequent competition between these mutants makes this 
phenomenon difficult to detect without deep sampling of individual lineages. 

Key words: bacterial evolution, recombination, vaccine escape, antibiotic resistance, selective sweeps, phylogenomics. 



Introduction 

streptococcus pneumoniae (the "pneumococcus") is an oro- 
nasopharyngeal commensal bacterium and respiratory patho- 
gen representing a common cause of pneumonia, 
bacteremia, and meningitis. Global estimates suggest that 
the pneumococcus was responsible for 826,000 deaths in 
children under 5 years in 2000 (O'Brien et al. 2009). A 
major clinical concern over recent decades has been pneumo- 
coccal multidrug resistance (MDR), defined as resistance to p- 
lactams and at least two other classes of antibiotic, the first 
example of which was detected in 1977 (Jacobs et al. 1978). 
Genotyping of MDR pneumococci has suggested that many 
such isolates belong to a small number of internationally dis- 
seminated "clones" of closely related bacteria (Klugman 
2002). It was hoped that their success would be reversed by 
the heptavalent antipneumococcal polysaccharide conjugate 
vaccine (PCV7), which targeted the seven "vaccine-type" 
pneumococcal capsule types (directly corresponding to sero- 
types) accounting for the majority of pre-PCV7 p-lactam- 
resistant isolates from invasive pneumococcal disease in the 
United States (Whitney et al. 2000). 

Although the incidence of antibiotic-resistant disease has 
typically decreased following PCV7's introduction, owing to 
the elimination of vaccine serotypes, the relative prevalence of 
antibiotic resistance in the pneumococcal population has gen- 
erally not fallen dramatically (Kyaw et al. 2006; Huang et al. 
2009). This is partly a consequence of particular MDR clones 
being associated with multiple capsule types likely as a conse- 
quence of "serotype switching": The acquisition of a novel 
capsule type through exchange of sequence at the capsule 
polysaccharide synthesis (cps) locus, which determines the se- 
rotype. Such a process allows clones generally associated with 
vaccine serotypes to persist in the post-PCV7 environment in 
the form of mutants expressing nonvaccine type capsules 
(Beall et al. 2011; Manage, Bishop, Huang, et al. 2011; 
Simoes etal. 2011). 

In the years following PCV7's introduction in the United 
States, the clone in which serotype switching had the biggest 
impact was Taiwan^^''-14, also known as PMEN14. This clone 
was originally detected as having the vaccine-type 1 9F capsule 
when first characterized by applying multilocus sequence 
typing (MLST) (Aanensen and Spratt 2005) to isolates from a 
Taiwanese hospital in 1997 (Shi et al. 1998). These multilocus 
sequence type (ST) 236 isolates were found to be resistant to 
P-lactams, tetracyclines, and macrolides. Epidemiological sur- 
veillance subsequently identified closely related serotype 19F 
isolates in other parts of Southeast Asia (Bogaert et al. 2002; Ip 
etal. 2002), South Africa (McGee etal. 2001), and the United 



States (Corso et al. 1 998). Isolates that appear to be members 
of this clone, based on genotyping information, but express- 
ing the nonvaccine serotype 1 9A have become similarly wide- 
spread. In Southeast Asia, in the late 1 990s, 1 9A isolates were 
detected of ST320; these represented a double locus variant 
(DLV) of ST236, as the two STs shared identical alleles at five of 
the seven MLST loci (Farrell et al. 2004; Ko and Song 2004; 
Choi et al. 2008). Subsequently, ST320 has become a highly 
prevalent multidrug-resistant genotype post-PCV7 in surveys 
of carriage (Manage, Bishop, Lee, et al. 201 1) and in cases of 
invasive pneumococcal disease (Moore et al. 2008; Beall et al. 

2011) in the United States. The 19A capsule type was also 
found in isolates with STs 271 (a single locus variant, or SLV, of 
ST236; these STs are identical at six of the seven MLST loci) 
and 236 itself. In this latter case, isolates were found to be less 
resistant to p-lactams than most PMEN1 4 isolates. Along with 
the observation of sequence similarity to a putative donor, this 
led to the hypothesis that a capsule-switching recombination 
had also altered the linked penicillin-binding protein (PBP) 
genes that determine susceptibility to such antibiotics 
(Moore et al. 2008). 

STs 236, 271, and 320 are all grouped within clonal com- 
plex (CC) 320, which also includes non-MDR pneumococci. 
However, this genotyping information is not sufficient to pre- 
cisely reconstruct the pattern by which the MDR phenotype, 
and vaccine escape 19A isolates, emerged. MDR may have 
emerged once, in which case PMEN14 would represent a 
single lineage within which subsequent diversification and 
intermittant reversion to antibiotic susceptibility may have con- 
tributed to the observed diversity. Alternatively, resistance 
may have been acquired on multiple occassions by closely 
related bacteria, in which case the diversity would represent 
the convergent evolution of susceptible progenitors. Such an 
observation would suggest that some aspect of the genotype 
may predispose it toward becoming a successful MDR 
pneumococcus. 

Phylogenomic analysis can provide the necessary resolution 
to distinguish these alternative hypotheses when isolates of 
different phenotypes are placed in the context of a broader 
sample. Alongside the isolates representing the variation in 
antibiotic resistance and serotype, primarily recovered from 
cases of disease, the collection assembled for this study in- 
cludes CC320 isolates from a survey of carriage in the 
Maela refugee camp in Northwestern Thailand (Turner et al. 

2012) . CC320 isolates were found to be the most common 
genotype in Maela by an independent population-wide survey 
(Chewapreecha et al. 2014); hence, this location is able to 
provide a "snapshot" of the clone's overall diversity. These 
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isolates also provide a useful comparison as a bacterial popu- 
lation not directly subject to selection by vaccine-induced inn- 
munity. This overall collection of bacteria should therefore be 
able to distinguish between alternative explanations as to the 
ennergence of MDR and subsequent instances of vaccine 
escape in this set of a clinically important pneumococci. 

Materials and Methods 

Phylogenomic Analysis 

DNA samples were collected for all isolates of CC320 available 
from the sources listed in supplementary table S1, 
Supplementary Material online. Samples were sequenced on 
the lllumina GAII and HiSeq platforms (supplementary table 
S1, Supplementary Material online) as multiplexed libraries. 
Short read data were then aligned against the 5. pneumoniae 
Taiwan^^''-14 genome (Donati et al. 2010) (EMBL accession 
code: CP000921) using SMALT vO.6.4 to generate a multiple 
genome alignment, from which polymorphic sites were iden- 
tified, as described previously (Croucher et al. 2012). Only 
samples with a mean coverage above 25-fold and unambig- 
uously calling bases at >90% of the positions in the reference 
sequence were used in the analysis. Samples were also ex- 
cluded if they showed signs of contamination, based on fre- 
quency of sites with evidence of multiple alleles, or if their 
serotype and ST (determined as described previously; 
Croucher et al. 201 1) both significantly deviated from previ- 
ously determined epidemiological information (i.e., if isolates 
were of a different serogroup and their MLST profile differed 
at two loci or more). 

Isolate 41_PMEN14 represents an independent culture of 
isolate TW31, from which the reference sequence was gen- 
erated. Furthermore, sequences 8561-06 and LMG87 were 
both generated from independent cultures of the same iso- 
late, as were 7848-05 and LMG95. Prediction of recombinant 
sequence and generation of a maximum-likelihood phylogeny 
were then conducted as described in Croucher et al. (201 1 ). In 
this analysis, all three pairs of sequences from the same isolate 
were found to be closely related sister leaf nodes in the phy- 
logeny. The same alignment was also analyzed with 
BRATNextGen (Marttinen et al. 2012), assuming four clusters, 
using a learned value of alpha, a window size of 1 kb, and a 
significance threshold P value of 0.05 (as calculated from 100 
permutations). 

A Bayesian coalescent analysis was performed on a subset 
of the alignment corresponding to the PMEN14 clade using 
BEAST (Drummond et al. 2012). Base substitutions predicted 
to have been introduced through recombination were ex- 
cluded from the alignment used in this analysis. The topology 
of the phylogeny was fixed as that of the rooted subtree from 
the overall analysis, with the years of isolation listed in supple- 
mentary table SI, Supplementary Material online, used to es- 
tablish a molecular clock based on a general time 



reversible substitution model. A relaxed lognormal clock 
prior (Drummond et al. 2006) was used for the substitution 
rate and a skyline plot prior (Drummond et al. 2005) was used 
for the population demography. All values were estimated 
with an effective sample size of over 200. 

Accessory Genome Distribution 

For the analysis of cps loci and integrative and conjugative 
elements (ICEs), lllumina sequence reads were assembled de 
novo using Velvet (Zerbino and Birney 2008), with scaffolds 
generated using SSPACE (Boetzer et al. 201 1) and sequence 
improvement conducted using the PAGIT pipeline (Swain 
et al. 2012). The serotype 19A cps loci displayed in supple- 
mentary figure S5, Supplementary Material online, have been 
submitted to the ENA with acccession codes HG799504 (iso- 
late 7848-05), HG799505 (isolate 8312-05), and HG799488 
(isolate SN28652). The ICEs displayed in supplementary figure 
S6, Supplementary Material online, have been submitted to 
the ENA with accession codes HG799503 (ICE5pSPN28652), 
HG799502 (ICE5pPT814), and HG799501 (ICE5p6027). 
Nucleotide sequence comparisons were performed using 
BLAT (Kent 2002) with default settings and analyzed using 
ACT (Carver et al. 2008). 

The distribution of antibiotic resistance genes shown in 
figure 4 reflects the mapping of sequence reads to the dis- 
played reference sequences identifed in the analysis of the 
PMEN1 lineage (Croucher et al. 201 1) using BWA vO.7.3 (Li 
and Durbin 2009). The coverage plots were then generated 
using Samtools (Li et al. 2009) and standardized by dividing 
the coverage at each base by the number of million reads 
generated in the sequencing of the sample. Biopython 
(Cock et al. 2009) was then used to display these data as 
heatmaps. 

Assessment of Potential Sequence Donors 

The regions corresponding to the serotype switching recom- 
binations importing the 19A capsule into SN28652 (ST320) 
and 8312-05 (ST236) were extracted from de novo assem- 
blies. These sequences were then used to search the de novo 
assemblies of the serotype 1 9A ST1 99 isolates from a study of 
isolates from Massachusetts (Croucher, Finkelstein, et al. 
2013) using Basic Local Alignment Search Tool (BLAST) 
(Altschul et al. 1990). The best matching isolates were taken 
as potential sequence donors, and combined with the recip- 
ients and all complete publicly available genome sequences 
(although using only 5. pneumoniae 0XC141 as a represen- 
tative of the CC180 clade; Croucher, Mitchell, et al. 2013) to 
represent the species-wide variation in the region flanking the 
cps locus. The regions orthologous to that between SPT_0362 
and SPT_0391 (dexB) in 5. pneumoniae Taiwan^^^-14 (the 
31,305 bp 19A Recombinant Region) were extracted from 
each genome, aligned using MUGSY (Angiuoli and Salzberg 
2011) and a phylogeny generated using RAxML (Stamatakis 
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Table 1 



MICs of Isolates Acquiring 19A Capsules 



Strain 


Serotype 


ST 


Penicillin MIC 








(ng/ml) 


7535-06 


19F 


236 


1 


8312-05 


19A 


236 


0.25 


SN28306 


19F 


320 


4 


SN39039 


19A 


320 


4 



Note. — These isolates are two pairs, each containing a serotype 19A isolate 
generated through capsule switching and the serotype 19F isolate most closely 
related to the ancestral sequence prior to the capsule switching recombination. In 
the ST236 isolates, the recombination at the capsule locus leads to a fall in pen- 
icillin resistance, because the flanking penicillin-binding protein genes are replaced 
by alleles similar to those in penicillin-susceptible isolates. This is not the case in 
ST320, where the level of resistance is maintained after the acquisition of the 19A 
capsule. 

et al. 2005). The sanne procedure was used to generate the 
phylogeny for the 31,850 bp control region, defined by the 
boundaries of coding sequences (CDSs) SPT_0438 and 
SPT_0473 in 5. pneumoniae Taiwan^^''-14. 

Antibiotic nnininnunn inhibitory concentrations (MICs) listed 
in supplementary table S1, Supplementary Material online, 
were provided by the organizations that originally collected 
the isolates. For a direct comparison, the penicillin MICs of the 
isolates listed in table 1 were retested using an E test in the 
same laboratory. 

Individual Gene Analyses 

Sequences corresponding to the resistance genes pbp1a, 
pbp2x, pbp2b, dyr, and folP were extracted from de novo 
Velvet (Zerbino and Birney 2008) assemblies of the CC320 
isolates. The protein sequences were aligned using MUSCLE 
(Edgar 2004) and then backtranslated to give a codon align- 
ment. This was then analyzed using BAPS (Tang et al. 2009) to 
provide an estimate of the number of different alleles in the 
collection, which was then used to inform an analysis of the 
alignment using BRATNextGen (Marttinen et al. 2012) with 
alpha fixed at 20 and a window length of 100 bp. 
Recombinant segments were identified using a threshold P 
value of 0.05, as calculated from 100 permutations. 

Results 

PMEN14 Is a Rapidly Recombining Lineage 

The sample collection consisted of 175 sequence data sets 
from 173 representatives of CC320 isolated between 1997 
and 2009, containing examples of serotypes 19F, 19A, and 
23F. They originated in 12 countries, the majority coming 
from Southeast Asia but also including representatives from 
the Middle East, Europe, and the United States. All isolates 
were sequenced as multiplexed libraries using the lllumina 
platform, generating paired-end reads as detailed in supple- 
mentary table SI, Supplementary Material online. Read pairs 
were aligned against the complete reference genome of 



5. pneumoniae Taiwan^^''-14, representing isolate TW31 
from the original identification of the clone in Taiwan in 
1997 (Donati et al. 2010), and bases called using previously 
defined criteria (Harris et al. 2010). Resequencing of TW31 
identified ten base substitutions relative to the reference 
genome. Similarly, for two isolates sequenced in duplicate in 
this study, one of serotype 1 9A and the other 1 9F, the pairs of 
consensus sequences were only distinguished by two and 
three polymorphic sites, respectively. The most divergent iso- 
late was found to be SN4691 of ST1584 (a DLV of ST236), 
which was used to root the phylogeny. Excepting this se- 
quence, 46,377 polymorphic sites were identified. This align- 
ment relative to the reference was analyzed using an iterative 
algorithm to simultaneously identify recombinant sequence 
and construct a phylogeny based only on vertically in- 
herited point mutations in the "clonal frame" of the 
genome, as described previously (Croucher et al. 2011) 
(fig. 1). BRATNextGen (Marttinen etal. 2012) produced similar 
results when applied to the same alignment (supplementary 
fig. SI, Supplementary Material online). 

The rooted phylogeny suggested that some of the 
P-lactam-sensitive isolates represented the ancestral pheno- 
type from which the MDR isolates emerged. These susceptible 
isolates were split into two clades: ST236 isolates of serotype 
19F (predicted to be the the ancestral serotype) formed clade 
"19F-PLS," and serotype 19A ST202 and ST3559 isolates 
formed clade "19A-PLS." The p-lactam-resistant genotypes 
were monophyletic; however, a single ST352 isolate (an SLV 
of ST236) was an outlier to the robustly supported PMEN14 
clade itself (supplementary fig. S2, Supplementary Material 
online). Within the PMEN14 clade, 61,096 base substitutions 
were reconstructed as having occurred across 38,744 poly- 
morphic sites. Of the base substitutions, 58,420 (96%) were 
imported by 451 recombinations, estimating the per site r/m 
statistic (the ratio of base substitutions accumulated by recom- 
bination relative to the number of point mutations) as 21.8. 
This value is far higher than that calculated for PMEN1 using 
the same approach (7.2), partly due to the larger number of 
recombinations per point mutation: 0. 1 7 for PMEN1 4 as com- 
pared with 0.10 for PMEN1 (Croucher et al. 2011). Many 
recombinations were clustered at high densities around the 
cps locus, which included those causing changes of serotype, 
and the pspA gene, which encodes the pneumococcal surface 
protein A antigen and was affected by 16 putative recombi- 
nations. However, the locus encoding the pspC antigen, 
which like pspA is observed to frequently undergo recombi- 
nation in other lineages (Croucher et al. 2011; Croucher, 
Finkelstein, et al. 2013; Chewapreecha et al. 2014), was 
only affected by four recombinations. This may be related to 
the locus being atypical in having two pspC paralogues in 
tandem (lannelli et al. 2002), an arrangement conserved in 
all isolates in the collection. Additionally, the loci annotated 
as mobile genetic elements (MGEs) in the reference sequence 
contributed little to the apparent diversification. Hence, the 
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per site r/m statistic only fell to 21.7 when recombinations 
occurring in such regions were excluded, although this is still 
lower than the sanne metric calculated from a smaller inde- 
pendent sample of the same genotype (34.1) (Croucher, 
Finkelstein, et al. 2013). These remaining sequence exchanges 
outside of MGEs are likely homologous recombinations, and 
followed a similar exponential length distribution to that ob- 
served in PMEN1 (rate parameter of 1 . 1 6 x 1 0""^ per bp; 95% 
confidence interval: 1 .07 x 10""^ to 1 .27 x 10""^ per bp; sup- 
plementary fig. S3, Supplementary Material online) (Croucher 
etal. 2012). 

The r/m ratio also reflected the low number of point mu- 
tations (2,676 mutations affecting 2,373 sites) within the 
PMEN14 clade. A root-to-tip distance plot of genetic diver- 
gence over time (n=164, /?^ = 0.40, P value < 2.2 x 10"^^; 
supplementary fig. S4, Supplementary Material online) pro- 
vided significant evidence of a molecular clock. A Bayesian 
coalescent analysis (Drummond et al. 2006) indicated that 
the lineage arose around 1987 (95% credibility interval: 
1981-1991), with a base substitution rate just under four 
base substitutions per year (1 .80x10"^ substitutions/site/ 
year; 95% credibility interval: 1.27 x 10"^ to 2.20 x 10"^), 
only slightly greater than previously calculated values 
(Croucher et al. 2011; Croucher, Finkelstein, et al. 2013). 
That the r/m is high despite this elevated mutation rate em- 
phasizes how quickly PMEN14 acquires material through 
transformation, resulting in a diverse set of genotypes circu- 
lating globally. 

Development of Resistance through Transformation 

These extensive levels of homologous recombination led to 
the import of sequences causing resistance to antibiotics 
(fig. 2). Resistance to trimethoprim most often arises from 
I100L substitutions in the Dyr (or FolA) protein (Adrian and 
Klugman 1997). PMEN14 appears to have originally been tri- 
methoprim sensitive and subsequently imported this resis- 
tance mutation on at least ten occasions, each time 
associated with one of the 1 5 recombinations that affects 
the dyr gene. The mutation has also been gained through 
recombination by the 19A-PLS isolate PT814, which has ad- 
ditionally acquired a fo/P allele with a small insertion at the S61 
position, a modification associated with sulphamethoxazole 
resistance (Maskell et al. 1997). The other two acquisitions 
of sulphamethoxazole-resistant fo/P alleles seen in the collec- 
tion correspond to the divergence of the ST352 isolate and the 
emergence of PMEN14. The continuing diversification of FolP 
within PMEN14 is clearly supported by the diversity of resis- 
tance-associated insertions in the protein, resulting in fo/P (like 
dyr) being a "hotspot" of recombination (fig. 1). 

Resistance to p-lactams in clinical pneumococcal isolates 
results from modification of PBPs through incorporation of 
heterospecific sequence in the pbp1a, pbp2x, and pbp2b 
genes (Dowson et al. 1993; Sibold et al. 1994). The 



evolutionary reconstruction (fig. 1) and analysis of gene se- 
quences (fig. 2) suggested that penicillin resistance was inde- 
pendently acquired by PMEN14 and ST352. Since the last 
common ancestor of both these genotypes, ST352 appears 
to have accumulated 31 recombinations encompassing 
218 kb, including events affecting pbpla, pbp2b, and pbp2x 
(but not murM or murN). Similarly, on the other branch lead- 
ing from the common ancestor of all the MDR isolates to the 
last common ancestor of the PMEN14 clade, 35 recombina- 
tions affecting 279 kb were predicted to have occurred. The 
murMN genes were once more unaffected, whereas the 
pbpla, pbp2x, and pbp2b genes were each again altered 
through the import of sequence and subsequently continued 
to diversify throughout the PMEN14 clade. 

Multiple Instances of Vaccine Escape 

Changes in pbp2x and pbpla are often associated with 
changes in serotype (fig. 2) as they flank the cps locus. Five 
serotype switches were identified, one of which involved the 
acquisition of the type 23F capsule, another antigen included 
in the PCV7 vaccine. This recombination extended over 
29.6 kb of the reference sequence and almost entirely pre- 
served the sequences of pbpla and pbp2x (fig. 3). The 
other four switches resulted in the acquistion of the 19A se- 
rotype, not targeted by the PCV7 vaccine. One of these, a 
78.8-kb long exchange that replaces pbp2x as well as an ex- 
tensive upstream tract, caused the serotype switch character- 
istic of clade 19A-PLS. However, only five amino acid 
substitutions were introduced into the 750 aa Pbp2X protein, 
none of which altered the p-lactam sensitivity of the isolates. 

Within the PMEN14 clade, the shortest capsule switching 
recombination (16.1 kb relative to the reference sequence) 
imported the 19A capsule into ST271. Although this import 
failed to span the entire cps locus, it did affect the wzy poly- 
merase gene, thought to be the crucial determinant distin- 
guishing the 19F and 19A serotypes (Mavroidi et al. 2007). 
The resulting mosaic cps locus within this post-PCV7 vaccine- 
escape lineage, represented by a pair of sequences corre- 
sponding to an isolate from Minnesota in 2005, therefore 
appeared to be a mosaic of the ancestral sequences charac- 
teristic of serotype 19F in the 5^ region with an imported wzy 
gene characteristic of serotype 19A (supplementary fig. S5, 
Supplementary Material online). A separate, but nearby, re- 
combination occurring on the same branch of the phylogeny 
changed the pbp2x sequence without affecting p-lactam re- 
sistance. The pbp2x gene was also affected directly by the 
other two, longer, recombinations that imported the 1 9A cap- 
sule type into PMEN14. One, associated with the emergence 
of the highly resistant ST320 clade, replaced one resistant 
pbp2x allele with another. However, the recombination that 
occurred in the ST236 19A isolate 5. pneumoniae 8312-05 
was the longest at 85.8 kb relative to the reference and 
replaced both pbp2x and pbpla with alleles similar to those 
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of the sensitive outgroup isolates (fig. 2). In concordance with 
previous observations (Moore et al. 2008), testing revealed 
that 5. pneumoniae 8312-05 had increased susceptibility to 
P-lactams (table 1). 

The ST199 lineage has been suggested to be the source of 
the 19A cps locus in both ST236 (Moore et al. 2008) and 
ST320 (Pillai et al. 2009). To test these hypotheses, a phylog- 
eny was generated for the 19A Recombinant Region, repre- 
senting an approximately 40-kb stretch of the genome 
immediately upstream of the cps locus that is predicted to 
have been imported along with the cps locus in both the 
ST320 and ST236 switches (fig. 3). In the case of ST236, this 
was found to be very similar to a potential ST199 donor 
(Croucher, Finkelstein, et al. 2013), although such a match 
could not be found for the ST320 isolate. An equivalently 
long stretch of sequence immediately downstream of the 
cps locus, unaffected by either serotype switching recombina- 
tion, was used as a control. In this locus, both 19A isolates 
were most similar to the reference genomes of the PMEN14 
genotype, as expected if they were unaffected by recent hor- 
izontal import of sequence. These data are consistent with the 
hypothesis that the ST236 switch involved a donor of ST199. 

Import of Resistance Cassettes 

Heterogeneity in antibiotic resistance profiles was also gener- 
ated by the movement of MGEs. Tetracycline and macrolide 
resistance genes were acquired on both branches on which 
P-lactam resistance emerged (fig. 4). In both cases, this was 
the result of the acquisition of a Tn9/6-type ICE, carrying a 
tetM tetracycline resistance gene, into which a macrolide re- 
sistance cassette had inserted. In PMEN14, the ICE was in- 
serted shortly downstream of the rpoBC operon and carried 
a mega element that encoded a nnef/nnel macrolide efflux 
pump (Del Grosso et al. 2006). In ST352, the Tn9/6-type el- 
ement was inserted downstream of /j/tA, which encodes the 
principal pneumococcal autolysin, and carried a Tn9/ 7 macro- 
lide resistance cassette (Shaw and Clewell 1 985). This suggests 
that ST352 and PMEN1 4 independently acquired these related 
MGEs. A further acquisition of aTn9/6-type ICE was observed 
in the 1 9A-PLS isolate 5. pneumoniae PT81 4, where the trans- 
poson was carried within a larger Tn5252-type ICE (Ayoubi et 
al 1991) inserted downstream of thez/T?p/\ gene (supplemen- 
tary fig. S6, Supplementary Material online). 

The presence of mega within PMEN14 since its inception 
contrasts with PMEN1 (originating around 1970) (Croucher 
et al. 2011), likely reflecting macrolide resistance becoming 
common in pneumococci after the emergence of PMEN1 but 
before the emergence of PMEN14 (Appelbaum 1992; 
Baquero et al. 2002). Nevertheless, PMEN14 appears to 
have also acquired the ermB macrolide resistance gene 
within the Omega cassette (Croucher et al. 2011) on four 
occasions, resulting in a "Jn2010" structure (Del Grosso 
et al. 2007). One of these instances has persisted through 



the clade in which the three serotype switches to 19A 
occurred within PMEN14 in this collection. As well as these 
instances of resistance emerging, the mel/mef pump appears 
to have been deleted twice. One of these deletions was rep- 
resented by an isolate carrying an ermB resistance gene, 
whereas in the other the loss led to a clade of three isolates 
becoming sensitive to macrolides (supplementary table SI, 
Supplementary Material online). These isolates were from 
the Maela refugee camp (Chewapreecha et al. 2014) and 
formed part of a large clade of ST4414 isolates (labeled 
ML2 in fig. 1) that appeared to represent the dissemination 
of a single clone within the camp. 

Diversity in a Single Unvaccinated Community 

The ML2 clade of 80 isolates was calculated as having an r/m 
of 12.8, demonstrating that PMEN14's high level of sequence 
import is also observed among cocirculating isolates in the 
absence of vaccination. Furthermore, the overall population 
of bacteria from Maela was polyphyletic with respect to the 
rest of the collection (five clades labeled ML1-5 in fig. 1), in- 
dicating that the clone has entered the camp at least five 
times. Based on this minimum, and the dates of isolation, 
each of the five clades seems to have been present within 
the camp in October 2008, with ML2, 4, and 5 all apparently 
cocirculating over the span of more than a year. 

Coexistence of diversity could also be detected on a smaller 
scale. The genomes of eight longitudinally sampled represen- 
tative colonies from a single individual, isolated between July 
2008 and March 2009, revealed 20 polymorphic sites in this 
population from a single nasopharynx (fig. 50- However, on 
detailed investigation, seven of these appeared to represent 
low-quality mapping or phase variation, leaving 13 high-con- 
fidence polymorphic sites (nonshaded columns in fig. 50- 
Little evidence of polymorphism was identified in isolates ob- 
tained before October; however, four single-nucleotide poly- 
morphisms were observed to be shared between the isolates 
from November to December. Subsequently, these polymor- 
phisms were lost from the February and March samples, 
which shared a third pattern of polymorphisms. The two 
final members of the clade are from different individuals 
around the end of the infant's carriage period: One is from 
the infant's mother and the second is from a nearby house- 
hold, indicating within- and between-household transmission, 
respectively. A second set of longitudinal samples collected 
from a single infant between January and April 2009 
(fig. 5^) again showed that polymorphisms arise and disap- 
pear even in the absence of transmission bottlenecks, con- 
trasting with the more clocklike accumulation of diversity 
over the longer history of the clade. 

Discussion 

Whole genome sequencing allows a more precise reconstruc- 
tion of the emergence and diversification of the PMEN14 
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clade than afforded by MLST and serotyping. Analysis of this 
collection reveals that, within ST236, the p-lactam-sensitive 
19F isolates represent the persistence of the ancestral pheno- 
type, from which the MDR isolates were derived; subse- 
quently, the p-lactam-sensitive ST236 19A isolates were 
themselves derived from this MDR genotype. Similarly, 
both ST271 and ST352 are SLVs of ST236; despite the 
former differing in serotype, it lies within the PMEN14 clade, 
whereas ST352 represents an independent emergence of the 
MDR phenotype. One step further removed, both ST320 
and ST202 are serotype 19A DLVs of ST236; yet the former 
represents a third switch to 19A within the PMEN14 
clade, whereas ST202 has independently acquired the 19A 
capsule, as well as resistance to tetracycline, sulphamethoxa- 
zole, and trimethoprim, and is quite distantly related to 
PMEN14. 

The requirement for detailed genetic information to resolve 
these ambiguities stems from the emergence of the same 
traits in parallel across the collection. This is typical of "soft" 
selective sweeps affecting the population (Hermisson and 
Pennings 2005). These are characterized by multiple indepen- 
dent origins of beneficial alleles emerging in parallel, resulting 
in more of the ancestral diversity being preserved than in 
"hard" sweeps where the beneficial mutation has a single 
origin. Soft sweeps are more likely when the rate of diversifi- 
cation and effective size of the population is high, as in the 
case of this successful, rapidly recombining lineage. Such 
behavior can also be facilitated by substructuring, as may be 
the case for such a geographically disparate population 
(Pennings and Hermisson 2006). Yet in this collection, evi- 
dence was found of differing genotypes coexisting at the 
scale of a single episode of carriage, based on the disappear- 
ance and reappearance of particular alleles from longitudi- 
nal sampling. Much greater genetic diversity was found to 
cocirculate within a single camp with an area of just 4 km^ 
(Turner et al. 2012), some of which impacted on antibiotic 
resistance phenotypes as a consequence of the observed di- 
versity of macrolide resistance cassettes, dyr sequences, and 
folP alleles. 

Across the wider collection, each of the resistances ob- 
served — to p-lactams, sulpha drugs, tetracycline, and macro- 
lides — ^were acquired more than once in this sample of a single 
lineage. This suggests that the emergence of the MDR phe- 
notype may be the product of an ongoing soft selective 
sweep. Similarly, following the introduction of PCV7, the 
1 9A capsule is observed to be acquired on three independent 
occasions within the PMEN14 clade alone. However, more 
epidemiologically rigorous samples of post-PCV7 population 
structures (Moore et al. 2008; Mahjoub-Messai et al. 2009; 
Song et al. 2009; Beall et al. 201 1 ; Hanage, Bishop, Lee, et al. 
2011) and the MLST database itself (Aanensen and Spratt 
2005) suggest that although the initial selective sweeps them- 
selves may be soft, subsequent competition leads to one 
mutant genotype prevailing. Generally, it seems that the 



more sparse the sampling or the longer the time between 
the selective pressure being exerted and samples being col- 
lected, the greater the tendency to infer that a genotype has 
been successful as the lone example of a rare mutation. More 
focussed data sets may reveal that the genotype has actually 
had to out-compete others sharing similar mutations selected 
by the initial sweep, suggesting more stringent selection on 
the ultimately successful genotype than would otherwise be 
expected following a strict "bottleneck." 

Of the diversity represented in this collection, the serotype 
19A ST320 genotype has been the most successful in the US 
post-PCV7. This seems partially contingent on PMEN14's pre- 
PCV7 success relative to the MDR ST352 genotype, the reason 
for which is difficult to establish given this current data set; the 
extensive recombination distinguishing these genotypes may 
have resulted in selectively important differences, or it could 
be the consequence of chance founder effects. In the case of 
the subsequent post-PCV7 competition between different 
backgrounds having acquired the 19A capsule, it is tempting 
to focus on the cps locus itself. The success of ST320 may stem 
from it being the only instance within PMEN14 where the 
entire cps locus is replaced, whereas high-level p-lactam resis- 
tance is retained. The reduction in resistance of the ST236 1 9A 
isolate is likely to be selected against. The mosaic cps locus 
found in the ST271 19A clade may also be less "fit" than the 
intact 19F or 19A loci, if the cohesion of cps loci is driven by 
epistasis between different genes within the cluster; this 
would account for its rarity in the United States in 2005, 
and absence from later samples (Beall et al. 2011). 
Nevertheless, the observation of these unsuccessful switches 
to 19A occurring in parallel is interesting in the absence of 
recombinants having acquired one of the many other 
common non-PCV7 serotypes, suggesting that serotype 
switching is nonrandom. 

In conclusion, the dense sampling in this analysis indicates 
that a diverse population of PMEN14, formed through exten- 
sive homologous recombination, can coexist even within a 
small community. This standing variation results in a soft se- 
lective sweep in response to a change in selection pressure, 
which can be observed by deep sampling of a lineage. 
Subsequent competition between the genotypes that persist 
after the initial sweep may then result in a single, successful 
predominant genotype that makes the sweep appear hard. 
This competition after the intial sweep is likely to represent an 
important step in selecting the fittest mutants that rise to 
prominence following the introduction of clinical 
interventions. 

Supplementary Material 

Supplementary figures S1-S6 and table SI are available at 
Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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